Descriptive Statistics

Frequency Distributions

  • FREQUENCY DISTRIBUTION
  • RELATIVE FREQUENCY DISTRIBUTION
  • PROPORTION
  • PERCENTAGE
  • CUMULATIVE
  • RATE
  • BAR GRAPH
  • HISTOGRAM
  • LINE GRAPH
  • STATISTICAL MAP

Objectives

  • Calculate proportions and percentages
  • Construct and analyze frequency, percentage, and cumulative distributions

DISTRIBUTION

Shows all the possible values (or intervals) of the data and how often they occur.


FREQUENCY DISTRIBUTION

A table reporting the number of observations falling into each category of the variable.

Table 1. Attitudes about sex before marriage

premarsx

n

always wrong

357

almost always wrong

122

wrong only sometimes

258

not wrong at all

1,378

Total

2,115

Survey question: There’s been a lot of discussion about the way morals and attitudes about sex are changing in this country. If a man and woman have sex relations before marriage, do you think it is _________.

Table 1. Attitudes about sex before marriage

premarsx

n

always wrong

357

almost always wrong

122

wrong only sometimes

258

not wrong at all

1,378

Total

2,115

The number of respondents who answered this survey question.

Table 1. Attitudes about sex before marriage

premarsx

n

always wrong

357

almost always wrong

122

wrong only sometimes

258

not wrong at all

1,378

Total

2,115

The number of respondents who said pre-marital sex was “wrong only sometimes.”

RELATIVE FREQUENCY DISTRIBUTION

A table showing the proportion or percentage for each value of a variable.


Proportions are between 0 and 1.0.

Proportion = count (f) / total number of cases (N).


Percentages are between 0 and 100.

Percentage = proportion × 100.

CUMULATIVE FREQUENCY DISTRIBUTION

The number or percentage of observations at or below a given category.


Table 3. Attitudes about sex before marriage, with cumulative percentages

premarsx

n

%

cumulative %

always wrong

357

17

17

almost always wrong

122

6

23

wrong only sometimes

258

12

35

not wrong at all

1,378

65

100

Total

2,115

100

175

\({\color{mathGreen} 17} + {\color{mathOrange} 6} = {\color{mathRed} 23\%}\)

RATES

\(\frac{Actual\;occurrences}{possible\;occurrences}\)


Examples:

Nominal variables:
can have frequency distributions, cannot have cumulative frequency distributions


Ordinal:
can have frequency distributions and cumulative frequency distributions


Interval-ratio:
can have frequency distributions, cumulative frequency distributions, and rates

A bar graph is used:
for nominal or ordinal variables,

to show frequencies or percentages,

using separated rectangles, with height proportional
to the frequency or percentage.

A histogram is used:
for interval-ratio variables,

to show frequencies or percentages,

using separated rectangles, with height proportional
to the frequency or percentage.

A line graph is used:
for interval-ratio variables,

to show frequencies or percentages,

joining by category the frequency or average with a line.

A statistical map is used:
for interval-ratio variables,

to show geographical variations, often in ratios,

using variation in color or hue.

Central Tendency

  • MEAN
  • MEDIAN
  • MODE
  • OUTLIER
  • MOVING AVERAGE
  • PERCENTILE
  • BIMODAL
  • SYMMETRICAL DISTRIBUTION
  • POSITIVELY SKEWED DISTRIBUTION
  • NEGATIVELY SKEWED DISTRIBUTION

Objectives

  • Explain the importance of measures of central tendency.
  • Calculate and interpret the mean, the median, and the mode.
  • Identify the relative strengths and weaknesses of the three measures.
  • Determine and explain the shape of a distribution.

Summary statistics


We use summary statistics to find out what is TYPICAL in a distribution.

Mean


MEAN

The arithmetic average obtained by adding up all the scores and dividing by the total number of scores.


  • most commonly used measure of central tendency,
  • it’s weakness is that it is sensitive to outliers (extreme scores in a distribution)

Mean: Calculation

Finding the mean in a list: \(7, 4, 2, 8, 0, 9, 5\)

  1. Add all observations together: \(7 + 4 + 2 + 8 + 0 + 9 + 5 = 35\)
  2. Divide the sum by the number of observations: \(\frac{35}{7} = 5\)
Family ID Annual Income (CAD)
F01 $48,000
F02 $52,000
F03 $45,000
F04 $50,000
F05 $53,000
F06 $49,000
F07 $46,000
F08 $51,000
F09 $175,000
F10 $250,000

Most families in this sample earn between $45K–53K, but two high-income households push the average far above what’s typical.

Median

MEDIAN

The arithmetic average obtained by adding up all the scores and dividing by the total number of scores.

The median is the value at the 50th percentile in a cumulative frequency distribution.

PERCENTILE

A score below which a specific percentage of the distribution falls.

Median: Calculation

Finding the median in a list with an odd number of observations:

\(7, 2, 1, 3, 4, 1, 5, 9, 2\)

  1. Put the list in order: \(1, 1, 2, 2, 3, 4, 5, 7, 9\)
  2. Pick the center number: \(3\)

Finding the median in a list with an even number of observations:

\(2, 0, 1, 2, 5, 1, 3, 1\)

  1. Put the list in order: \(0, 1, 1, 1, 2, 2, 3, 5\)
  2. Add the two center numbers & divide by 2: \(\frac{1 + 2}{2} = 1.5\)

Mode


MODE

Category or score with the highest frequency (or percentage) in a distribution.


BIMODEL

Two values or categories with the highest frequency.

Mode: Calculation

Finding the mode in a list: $7, 2, 1, 3, 4, 1, 5, 1, 2 $

  1. Put the list in order: \(1, 1, 1, 2, 2, 3, 4, 5, 7\)
  2. Pick the most frequent number: \(1\)